24 Cathedral Road / 24 Heol y Gadeirlan

Cardiff / Caerdydd

            CF11 9LJ Nick Ramsay AM           Tel / Ffôn: 029 2032 0500

Chair of the Public Accounts Committee                                      Fax / Ffacs: 029 2032 0600

National Assembly for Wales        Textphone / Ffôn testun: 029 2032 0660 Cardiff Bay Cardiff              CF99 1NA     info@audit.wales / post@archwilio.cymru          www.audit.wales / www.archwilio.cymru

Reference:    HVT/2852/caf

Date issued: 8 June 2018

 

Dear Nick

Incidents affecting availability of national clinical information systems

Given that the Committee is still in the process of finalising its inquiry on NHS informatics services, I thought I should bring to your urgent attention some information my staff have picked up in relation to the resilience of major NHS IT systems in Wales. We have recently learnt that a number of NWIS maintained systems have suffered “major incident” failures over the last 12 months. The situation is helpfully summed up in a paper that was recently presented to the Velindre NHS Trust board, and which I have enclosed. 

The incidents of 24 January 2018 and 21 March 2018 appear to have caused particular concern given they affected a number of key clinical systems resulting in  significant local problems for NHS bodies in respect of maintaining routine clinical activity while the systems were down.  I am aware that Abertawe Bro Morgannwg University Health Board has raised concerns in respect of these two incidents and sought assurances from the Chief Executive of Velindre NHS Trust, although we are not sighted of any response as yet. We are not aware of patients coming to harm as a result of the system failures, although it can reasonably be expected that the issues will have contributed to negative patient experiences and significant frustration on the part of NHS staff.  Indeed, as I write this letter I can see that news on the IT problems is already starting to filter through to the media. I understand that NWIS has taken immediate action to investigate and address each of the system failures, however, apparent delays in the service receiving formal investigation reports from NWIS mean that it is not possible to pinpoint the cause of the problems.  Anecdotal evidence suggests that the system failures are the result of various different factors possibly relating to underlying infrastructure issues.

 

Page 1 of 2 - Incidents affecting availability of national clinical information systems - please contact us in Welsh or English / cysylltwch â ni’n Gymraeg neu’n Saesneg.

I must stress that these issues have come to light since I reported my findings on NHS informatics systems earlier this year, and it is important to note that this work did not examine issues relating to the resilience of systems or business continuity arrangements.  As such I am not able to draw on any specific audit work in this area. However, from the information we do have it has struck me that there could be some read across to the wider systemic challenges identified in my recent report and the Committee’s subsequent inquiry.  As such I thought it was appropriate that I draw the Committee’s attention to these issues so that it can consider whether or not to extend its inquiry in some way to include these significant concerns.

Yours sincerely

Huw Vaughan Thomas

Auditor General for Wales

 

 

Enc:   Briefing on CANISC Major Incidents and Business Continuity

Page 2 of 2 - Incidents affecting availability of national clinical information systems - Please contact us in Welsh or English / Cysylltwch â ni’n Gymraeg neu’n Saesneg.


 

 

TRUST BOARD 

 

 

 

BRIEFING ON CANISC MAJOR INCIDENTS AND BUSINESS CONTINUITY 

 

 

Meeting Date:

30th May 2018

Authors:

Ann Marie Stockdale and Lisa Miller

Sponsoring Director:

Andrea Hague and Mark Osland

 

Report Presented by:

Andrea Hague

 

Committee/Group who have received or considered this paper:

None

 

Trust Resolution to: (please tick)    

APPROVE:

 

REVIEW:

 

INFORM:

 

ASSURE:

 

Recommendation:

The Trust Board are asked to note the content of this paper and request that the Executive Management Board consider the options in more detail leading to a final option being taken forward as a matter of urgency.

 

 

This report supports the following Trust objectives as set out in the Integrated

Medium Term Plan: (please tick)   

Equitable and timely services

 

Providing evidence based care and research which is clinically effective

 

Supporting our staff to excel

 

Safe and reliable services

 

First class patient /donor experience

 

Spending every pound well

 

ACRONYMS

 

 

This report supports the following Health & Care Standards:

http://www.wales.nhs.uk/governance-emanual/health-and-care-standards

                  Staying Healthy

                 Safe Care

                 Effective Care

                 Dignified Care

                 Timely Care

                 Individual Care

                  Staff and Resources

 

1.             Introduction / Background

 

1.1         This paper has been produced to brief the Trust Board regarding ongoing major incidents with the Canisc system within VCC.

 

1.2         There are a number of IT systems and supporting infrastructure in use at Velindre Cancer Centre that provides functions critical for the safe delivery of patient care. cord and administration system which was developed over 20 years ago.

1.3         This paper focuses specifically on incidents categorised as major at a local and/or national level.  ServicePoint is the tool in place for the recording of such incidents for resolution and Datix incident reports are completed in line with Trust Policy.

 

1.4         These National clinical and administration critical systems are hosted by the NHS Wales Informatics Service (NWIS) out of the Newport and Blaenavon Data Centres.  

 

1.5         Major incidents affecting the availability of national clinical critical systems including, but not limited to, Canisc and the Welsh Laboratory Information Management System (WLIMS) have reached increased in frequency with a total of 11 incidents occurring since 17 April 2018, summarised below along with the NWIS definitions:-

 

Date

System

Timescale 

17 April 2018

National Canisc Incident

12:23  15:39

24 April 2018

National Data Centre Incident (Blaenavon)

20:30 to 14:00 25 April  2018

30 April 2018

National Canisc Incident

13:23  13:46

30 April 2018

National Welsh Laboratory Information Management System (WLIMS)

15:18 to 17:30

2 May  8 May

2018 

National Canisc Incident 

22:00 to 14:45

14 May 2018

National Canisc Incident

09:10  19:00

14 May 2018

National Welsh Laboratory Information Management System (WLIMS)

11:58 - 18:11

15 May 2018

National Welsh Laboratory Information Management System (WLIMS)

14:45  18:22

16 May 2018

National Canisc Incident

07:50  11:30

17 May 2018

National Canisc Incident

08:30  20:01

22 May 2018

National Canisc Incident

08:00  13:40

 

Definition of Major Incidents (NWIS)

 

Calls - Priority 1

(P1)

P1s are one or more Incidents which cause significant business impact, resulting in the loss of a critical clinical or administrative service within one or more core sites. These would typically be incidents which:

                Cause unavailability of the Service, a key module or a major function;

                 OR Cause incorrect processing of data or errors in a major Software function; 

               AND affect multiple Authority Party Sites

Calls 

Priority 2

(P2)

P2s are one or more Incidents which cause localised business impact resulting in the loss of normal clinical or administrative service, to one site or multiple non-core sites.

These would typically be incidents which:

     Cause disruption to a number of Users or unavailability to a single user within a core site of the Service, a key module or a major function;

                OR Cause incorrect processing of data or errors in a Software function;

 

                   AND Affect some users at multiple Service Recipients or all users at a single site.

 

1.6         The summary above reflects the incident resolution timescale only, not the recovery timescale or associated cost specific to the Cancer Centre.   

1.7         At the time of providing this report, initial NWIS feedback has referred to the root cause being underlying infrastructure issues for the majority of incidents, however a formal response has not been received. 

1.8         The Trust IG&T Committee has highlighted concerns to the NWIS Directors regarding the delay in receiving formal incident investigation reports. Please see appendix 1 for detail.

1.9         However, in terms of impact, this has resulted in:-

Lack of information to prescribe chemotherapy/radiotherapy

Delays/risk of error in prescribing chemotherapy/radiotherapy

Inability to access blood results, essential to support decision making in relation to patient treatment/care

Decisions made without all information to hand, requirement for retrospective checks to be made, leading to an increased likelihood of error Poor patient experience of time)

Potential impact on clinical staff attrition/retirement

Added stress to overburdened work force 

Negative experience and could impact staff retention where it is hard to recruit (in light of medical staff shortages etc)

Consultants preparing every clinic (often in personal time at home) in the expectation IT systems cannot be accessed stress levels etc

1.10 All incidents were immediately reported to the NHS Wales Informatics Service, who then commenced their internal investigation in line with their service management processes.  

1.11 Velindre Cancer Centre implemented their business continuity plan, including emergency communication key staff on a regular basis.

 

2.             Timing:

 

2.1.       This paper will advise Trust Board members of the current position. The Executive Management Board will discuss the options in more detail when the financial and implementation times are known. The EMB will then monitor the progress and provide regular reports to the Trust Board. 

 

2.2.       The current business continuity plan cannot be sustained and more importantly does not provide a full solution to an incidents. Therefore the urgent action is required. 

 

3.             Description:

 

3.1.       The Trust IG&T Committee received a report on major incidents and discussions took place relating to recent IT systems issues during week commencing 14th May 2018. The Committee discussed the risks to the service, impact upon patient experience and staff wellbeing. The Chair also reported that this issue was raised many times during her induction walkaround at VCC.

 

3.2.       VCC have business continuity plans in place which are tested regularly in real time due to the frequency of incidents. These area combination of providing and printing the paper record and access to some other systems such as WPAS.

3.3.       Whilst the implementation of Welsh Clinical Portal, Chemocare and LIMS have enhanced continuity arrangements the reliance upon CANISC as a Patient Administration System (PAS) remains a risk as these systems do not replace the full functionality.

3.4.       It is accepted that there is a programme to replace CANISC but at this stage there are no definitive timescales for this work to be completed. 

 

3.5.       VCC SMT now require more robust and enhanced continuity arrangements are implemented as a matter of urgency. This will mean reprioritisation of work and the understanding that some work will need to cease to allow the relevant departments to focus upon this work.

 

An assessment of the options available are stated below. However, at this stage the timescales and financial implications for option 3 and 4 are unknown. The ADI and Head of Information at VCC are progressing these with the relevant colleagues. Once these are known the full option appraisal will be provided to the EMB.

 

3.5.1.         Option 1 Do nothing  best endeavours on a case by case basis.

 

3.5.2.         Option 2 Revert to recording all information in the physical medical record. It is recognised

 

 inefficient, but will ensure that all Canisc case note information is available to all staff as  required enabling safe care.

3.5.3.         Option 3 Copy of the read-only Canisc database to be mirrored at Velindre Cancer Centre.  This option would need to be undertaken with NWIS.  In the event of a  network incident  at the Cancer Centre, the read only version of Canisc will become unavailable. 

3.5.4.         Option 4 Development of the Velindre Cancer Centre data warehouse to include data  extracts specific to new patient and follow-up annotations.  

 

3.6.       Alongside the above options there are 3 key pieces of work required to support business continuity for the Cancer Centre, these are:-

 

3.6.1.         Canisc Case Note Summary to be made available in the Welsh Care Record Service

3.6.2.         Document Management System (VCC documents available in the Welsh Care Record   Service); and 

3.6.3.         Welsh Clinical Portal link to the e-Master Patient Index

The Medical IT lead has identified the above as the minimum to enable safe care that must occur alongside the options.

4.             Financial Impact:

 

4.1.       Unknown at this stage.

5.             Quality, Equality, Safety and Patient Experience Impact

 

5.1.       There is no evidence to suggest that patients have been harmed but it is evident that the patient experience is poor when such incidents occur. 

 

5.2.       There is acknowledgement of a high profile (UK) case citing a computer failure requiring a doctor to obtain results over the telephone and a cognitive error being made.  This led to the junior doctor essons learnt in relation to current Trust services and pressures. 

 

6.             Considerations for Board / Committee 

 

6.1.       Trust Board are asked to note the contents of this paper and task the EMB with taking forward the most appropriate option as a matter of urgency.

6.2.        for both VCC and outreach clinics remains in place which requires the printing

of or saving of case notes in alternative format. This clearly has a significant resource implication and has led to some tasks or work ceasing in order to prioritise this service continuity plan.

7.             Next Steps:

 

7.1.       Trust Board are asked to consider this paper.

 

7.2.       ADI and VCC Head of Information to finalise options with financial and timescale information.

7.3.       Chief Executive to ensure options are discussed at EMB and an appropriate recommendation is resourced and achieved.  

Appendix 1

Quarter 1 Update

During Quarter 1 2017, a total of six major incidents were reported.  The status in relation to each incident is detailed below.

INC 01   

LIMS National Incident 24 May 2017 

Status:

Incident resolved.   Awaiting Major Incident Report from NWIS, due September 2017.

INC 03 

LIMS National Incident 20 June 2017

Status:

Incident resolved.   Awaiting Major Incident Report from NWIS, due October 2017.

INC 04 

Canisc Interface Feeds 19 June 2017

Status: 

Incident resolved.   Awaiting Major Incident Report from NWIS, due October 2017.

INC 06

Canisc National Incident 20 June 2017

Status: 

Incident resolved.   Awaiting Major Incident Report from NWIS, due October 2017.

INC 07

Velindre Cancer Centre Network 11 July 2017

Status:

Incident resolved.  Root cause analysis completed (see February 2018 report)

INC 08

Network Issues at Cardigan Leisure Centre (WBS Donation Clinic)

Status: 

Incident resolved.  No further incidents of a similar nature.  Equipment to test local Wi-Fi resilience has been purchased and will be used when undertaking risk assessments of potential new venues for donation clinics.

Quarter 2 Update  

INC 01 

Radiology Investigation Reports unavailable in Canisc 22 August 2017                       

Status:

Incident re-opened.   Incident re-opened January 2018.  Due to issues with setting up the end to end test environment, limited resources and operational issues, NWIS have delayed the testing of the Radis/Canisc bug fix until 09.04.18. Root Cause Analysis in draft. 

INC 02 

Unavailability of Network (National Incident) 15 September 2017, 09:30

Status: 

Incident resolved.  Awaiting Major Incident Report from NWIS, due December 

 

2017.

INC 03 

Unavailability of Network (Local Incident) 15 September 2017, 014:30

Status: 

Incident resolved.  Root cause identified at time of incident and de-escalated.  Investigation completed and reported in DATIX.

Quarter 3 Update

No incidents

Quarter 4 Update

INC 01

Unavailability of Network (National Incident) 24 January 2018, 11:45

Status

Incident resolved.  All Wales Major Incident declared following connection issues with both national data centres.  A range of clinical and admin critical systems were unavailable for a period of approx. 4 hours.  Systems affected included CANISC, WCP, WLIMS (Cardiff & Vale UHB), all Trust email services and WHTN phone lines. Whilst still operational, the WBS experienced performance issues with its core ePROGESA system, as well as some label and report printing issues.  The loss of email meant donor registry-to-registry messaging (used for donor/patient matching) was unavailable for the period of the outage.  The outage

directly impacted on patient services in VCC  e.g. radiotherapy patients delayed, unable to

access test results, unable to schedule and manage outpatient attendances.  NWIS have confirmed the cause as an equipment failure in the Newport Data Centre following delivery

submitted to Welsh Government.  Major Incident Report awaited.  The Trust is awaiting further feedback on the root cause and future mitigation(s).

INC 02

Unable to Access Canisc (National Incident) 1 February 2018

Status

Incident resolved.  Connection problem identified when users attempting to launch Canisc via the national Blaenavon Data Centre.  Major Incident Report awaited.

INC 03

Users Unable to Access WBS Online Booking System (National Incident)  20 March 2018 @ 18:05

Status

Incident resolved.  A planned change to national firewalls by NWIS (ServicePoint ref: 57202) impacted on services using the outer Demilitarized Zone (DMZ), which includes the WBS online booking system.  Users were unable to access the website for approx. 2hours.  WBS /

Velindre were not notified of the planned change, nor was the online booking system

 

ongoing with NWIS to understand root cause and ensure WBS / Velindre services are appropriately referenced within NWIS Service Catalogue.

INC 04

National System Outage 21 March 2018 @ approx. 16:00

Status

Incident resolved.  Network issue in Blaenavon Data Centre (BDS) resulted in failure of a number of clinic and admin critical systems for a period of approx. 2 hours.  Trust email services affected.  In VCC access to XXXX was restricted, which resulted in XXXX.  Full connectivity was restored by NWIS shortly after 18:30.  The NHS Wales Informatics Service

 further feedback on the root cause and future mitigation(s).

INC 05                National Loss of Access to WLIMS 29 March 2018 15:56  17:10

Status: Incident resolved.  All Wales incident preventing users from accessing LIMS.  Messages not flowing  users unable to send test requests and receive test results.  Root cause identified as change to increase capacity on the Lab Database impacted the Citrix servers.